NAGS, CPS1, and SLC25A13 (Citrin) at the Crossroads of Arginine and Pyrimidines Metabolism in Tumor Cells

Urea cycle enzymes and transporters collectively convert ammonia into urea in the liver. Aberrant overexpression of carbamylphosphate synthetase 1 (CPS1) and SLC25A13 (citrin) genes has been associated with faster proliferation of tumor cells due to metabolic reprogramming that increases the activity of the CAD complex and pyrimidine biosynthesis. N-acetylglutamate (NAG), produced by NAG synthase (NAGS), is an essential activator of CPS1. Although NAGS is expressed in lung cancer derived cell lines, expression of the NAGS gene and its product was not evaluated in tumors with aberrant expression of CPS1 and citrin. We used data mining approaches to identify tumor types that exhibit aberrant overexpression of NAGS, CPS1, and citrin genes, and evaluated factors that may contribute to increased expression of the three genes and their products in tumors. Median expression of NAGS, CPS1, and citrin mRNA was higher in glioblastoma multiforme (GBM), glioma, and stomach adenocarcinoma (STAD) samples compared to the matched normal tissue. Median expression of CPS1 and citrin mRNA was higher in the lung adenocarcinoma (LUAD) sample while expression of NAGS mRNA did not differ. High NAGS expression was associated with an unfavorable outcome in patients with glioblastoma and GBM. Low NAGS expression was associated with an unfavorable outcome in patients with LUAD. Patterns of DNase hypersensitive sites and histone modifications in the upstream regulatory regions of NAGS, CPS1, and citrin genes were similar in liver tissue, lung tissue, and A549 lung adenocarcinoma cells despite different expression levels of the three genes in the liver and lung. Citrin gene copy numbers correlated with its mRNA expression in glioblastoma, GBM, LUAD, and STAD samples. There was little overlap between NAGS, CPS1, and citrin sequence variants found in patients with respective deficiencies, tumor samples, and individuals without known rare genetic diseases. The correlation between NAGS, CPS1, and citrin mRNA expression in the individual glioblastoma, GBM, LUAD, and STAD samples was very weak. These results suggest that the increased cytoplasmic supply of either carbamylphosphate, produced by CPS1, or aspartate may be sufficient to promote tumorigenesis, as well as the need for an alternative explanation of CPS1 activity in the absence of NAGS expression and NAG.


Expression of NAGS, CPS1, and Citrin in Tumors and Patient Outcomes
We queried the cBioPortal database [42,43] to determine whether the expression of NAGS, CPS1, and citrin in glioblastoma multiforme, glioma, stomach adenocarcinoma, and lung adenocarcinoma correlated with the patient outcomes for these four tumor types. Patient outcome data for stomach and esophageal carcinoma are not available in the cBio-Portal; cBioPortal does have the outcome data for patients with glioblastoma, which is a type of glioma. Therefore, we queried cBioPortal for the association between NAGS, CPS1, and citrin mRNA expression levels and outcomes of patients with glioblastoma, glioblastoma multiforme, lung adenocarcinoma, and stomach adenocarcinoma. Glioblastoma, glioblastoma multiforme, lung adenocarcinoma, and stomach adenocarcinoma samples were ranked according to expression level of NAGS, CPS1, or citrin mRNA and divided into quartiles based on the expression levels of the three genes, followed by comparison of survival times of patients with the highest and the lowest expression of the three genes ( Figure 2). The highest NAGS mRNA expression was associated with a worse outcome, i.e., shorter survival time, in patients with glioblastoma ( Figure 2A). There was no association between CPS1 and citrin mRNA expression levels and the outcome of patients with glioblastoma ( Figure 2B,C). There was a trend (p = 0.07) towards a worse outcome (shorter survival time) for patients with glioblastoma multiforme exhibiting the highest expression of NAGS mRNA, compared to glioblastoma multiforme patients exhibiting the lowest expression of NAGS mRNA ( Figure 2D). Expression levels of CPS1 and citrin mRNA were not associated with the outcome of patients with glioblastoma multiforme ( Figure 2E,F). There was a trend towards a worse outcome (p = 0.07) in patients exhibiting the lowest quartile of NAGS mRNA expression in lung adenocarcinoma ( Figure 2G), while the outcome was worse for patients with the highest CPS1 mRNA expression in lung adenocarcinoma ( Figure 2H). There was no association between citrin mRNA expression and outcome of patients with lung adenocarcinoma (Figure 2I

Epigenetic Regulation of NAGS, CPS1, and Citrin in Lung Adenocarcinoma
Overexpression of NAGS and CPS1 genes in the brain, lungs, and stomach tissueswhere these two genes are not highly expressed-would require epigenetic changes that allow the transcription of the two genes. The citrin gene is expressed in most tissues due to its functioning in the malate-aspartate shuttle [26]. Therefore, the epigenetic regulation of citrin ought to be similar in the liver, brain, lung and stomach. We sought to compare chromatin accessibility to transcription factors, histone modifications, binding of CTCF (a component of the cohesin complex that regulates chromatin domain organization), and a POLR2A subunit of RNA polymerase II as indicators of molecular mechanisms that ena-

Epigenetic Regulation of NAGS, CPS1, and Citrin in Lung Adenocarcinoma
Overexpression of NAGS and CPS1 genes in the brain, lungs, and stomach tissueswhere these two genes are not highly expressed-would require epigenetic changes that allow the transcription of the two genes. The citrin gene is expressed in most tissues due to its functioning in the malate-aspartate shuttle [26]. Therefore, the epigenetic regulation of citrin ought to be similar in the liver, brain, lung and stomach. We sought to compare chromatin accessibility to transcription factors, histone modifications, binding of CTCF (a component of the cohesin complex that regulates chromatin domain organization), and a POLR2A subunit of RNA polymerase II as indicators of molecular mechanisms that enable gene expression in the liver, where NAGS, CPS1, and citrin are highly expressed; brain, lung and stomach tissues; and cancer cell lines that model glioblastoma, glioblastoma multiforme, lung adenocarcinoma, and stomach adenocarcinoma. DNase-Seq and ATAC-Seq data, which indicate chromatin accessibility to transcription factors; ChIP-Seq data for CTCF; POLR2A; H3K4me3, which indicate active gene transcription; and H3K27ac, which indicates active enhancers, were available in the ENCODE database for the liver and lung tissues, and for the A549 lung adenocarcinoma cell line.
The NAGS promoter, −3 kb enhancer, and a regulatory element in the first intron of the NAGS gene regulate its expression in the liver [44,45] and are among candidate cis-regulatory elements (cCRE) of the NAGS gene ( Figure 3A-C, highlighted in green). DNase-Seq and ATAC-Seq data indicate that the NAGS promoter and −3 kb enhancer are accessible to transcription factor binding in the liver ( Figure 3A), but not in the A549 cell line and lung tissue ( Figure 3B,C). An ATAC-Seq peak located approx. 9.5 kb upstream of the NAGS transcription start site (TSS) coincides with CTCF binding in the liver, A549 cells, and lung, suggesting that this region may represent a boundary between the NAGS locus and the upstream PYY locus ( Figure 3A-C). The −3 kb NAGS enhancer and intronic regulatory elements are associated with strong H3K27ac signals in the liver, indicating that these two regulatory elements are active enhancers in the liver ( Figure 3A). The H3K27ac signal is absent at the −3 kb enhancer in A549 cells and lung tissue ( Figure 3B,C), consistent with the liver-specific activity of this regulatory element [45]. The H3K27ac signal at the NAGS intronic regulatory element is markedly weaker in A549 cells and lung tissue, indicating low NAGS expression ( Figure 3B,C). The H3K4me3 and POLR2A signals were lower in A549 cells and lung tissue than in the liver ( Figure 3A-C). Taken together, these data are consistent with low NAGS expression in the A549 cells, lung adenocarcinoma, and lung tissue.
Expression of CPS1 is regulated by a proximal enhancer and promoter located immediately upstream of CPS1 TSS, and a distal enhancer located approx. 7.5 kb upstream of the CPS1 TSS [45,46]. These known CPS1 regulatory elements are associated with strong ATAC-Seq and DNase-Seq signals in the liver (Figures 3D and S1, highlighted in green). Upstream of the CPS1 distal enhancer are several cCREs, of which, some are associated with ATAC-Seq and DNase-Seq signals in the liver ( Figures 3D and S1A). The distal CPS1 enhancer is associated with strong ATAC-Seq and DNase-Seq signals in A549 cells ( Figures 3E and S1B), but not in the lung ( Figures 3F and S1C). The other known and predicted CPS1 upstream regulatory elements are associated with weak ATAC-Seq and DNase-Seq signals in the lung and A549 cells ( Figures 3E,F and S1). Approximately 35 kb upstream of the CPS1 TSS is a site that binds CTCF in the liver and is predicted to be a chromatin insulator (Figures 3D and S1A). This predicted chromatin insulator does not bind CTCF in the lung and A549 cells ( Figure 3E,F). Known CPS1 regulatory elements and some of the predicted CPS1 regulatory elements are associated with strong H3K27ac, H3K4me3, and POLR2A signals in the liver ( Figure 3D), which are weak in A549 cells ( Figure 3E) and absent in the lung tissue ( Figure 3F). This is consistent with aberrant CPS1 expression in lung adenocarcinoma.
Expression of the citrin gene is regulated by a recently characterized promoter [47,48]. This promoter is associated with strong ATAC-Seq, DNase-Seq, H3K27ac, H3K4me3, and POR2A signals in the liver, A549 cells, and lung tissue, consistent with ubiquitous citrin gene expression ( Figures 3G-I and S2). ATAC-Seq signals were associated with some of the cCREs in the first intron of the citrin gene and its upstream regulatory region ( Figures 3G-I and S2). CTCF binding was not detected in the first intron of the citrin gene and within 25 kb upstream of the citrin TSS ( Figure 3G-I). However, CTCF binding sites were present approx. 100 kb upstream of the citrin TSS and within intron 4 of the citrin gene ( Figure S2). Similar epigenetic regulation of citrin in the liver and lung is consistent with its expression in many tissues, and could contribute to its aberrant expression in lung adenocarcinoma.

Sequence Variation and Expression in Tumor and Normal Tissues
Since NAGS, CPS1, or citrin genes are not highly expressed in the brain, lung, and stomach, we explored molecular mechanisms that can contribute to elevated NAGS, CPS1, and citrin activities in tumors. Increased copy numbers of NAGS, CPS1, and citrin genes and/or activating sequence variants could result in higher activity of the three proteins in glioblastoma, glioblastoma multiforme, lung adenocarcinoma, and stomach adenocarcinoma. Structural variants resulting in increased copy numbers of NAGS, CPS1, and/or citrin loci could explain the overexpression of the three genes. Therefore, we queried cBioPortal for the number of glioblastoma, glioblastoma multiforme, lung adenocarcinoma, and stomach adenocarcinoma samples exhibiting NAGS, CPS1, and citrin gene copy number variation (CNV), and whether there is an association between CNVs and mRNA expression of the three genes in the four tumor types ( Figure 4). CNVs are classified as deep deletions, shallow deletions, diploid, gain, and amplifications in cBopPortal, and they correspond to 0, 1, 2, a few additional copies, and more than a few additional copies of the gene/region of interest, respectively [49]. Glioblastoma and glioblastoma multiforme had similar patterns of NAGS, CPS1, and citrin CNV (Figure 4). NAGS and CPS1 loci were diploid in the 82 and 89%, respectively, of glioblastoma samples while 5-10% of glioblastoma samples exhibited either shallow deletions or gains of NAGS and CPS1 genes ( Figure 4A). Most of the glioblastoma samples (80%) exhibited gains of the citrin gene ( Figure 4A). The citrin gene was diploid in 18% of glioblastoma samples ( Figure 4A). The remaining glioblastoma samples had amplifications of the citrin gene ( Figure 4A). There was no association between CNVs and the expression of NAGS and CPS1 mRNA in glioblastoma ( Figure 4B,C). The expression of citrin mRNA was higher in glioblastoma samples with a higher copy number of the citrin gene ( Figure 4D). The majority of glioblastoma multiforme samples were diploid for NAGS and CPS1 loci and exhibited gains of the citrin gene ( Figure 4E). The remaining glioblastoma multiforme samples exhibited either shallow deletions or gains of the NAGS and CPS1 genes, and either gains or amplifications of the citrin gene ( Figure 4E). There was no association between CNVs and the expression of NAGS and CPS1 mRNA in glioblastoma multiforme samples ( Figure 4F,G). The expression of citrin mRNA was higher in the glioblastoma multiforme samples with a higher copy number of the citrin gene ( Figure 4H).
More than 90% of the lung adenocarcinoma samples were diploid for NAGS, CPS1, and citrin genes ( Figure 4I). The remaining lung adenocarcinoma samples exhibited deep and shallow deletions, gains, and amplifications of the three genes ( Figure 4I  The CNV of NAGS and CPS1 were similar in stomach adenocarcinoma samples. The majority of the stomach adenocarcinoma samples, 64% and 71%, were diploid for NAGS and CPS1 genes, respectively; 20% and 16% of the stomach adenocarcinoma samples exhibited gains of NAGS and CPS1, respectively; 13% and 10% of stomach adenocarcinoma samples exhibited shallow deletions of NAGS and CPS1, respectively; and the remaining stomach adenocarcinoma samples had either deep deletions or amplifications of the two genes ( Figure 4M). The majority of the stomach adenocarcinoma samples were either diploid (48%) or exhibited a gain (38%) of the citrin gene; the remaining samples exhibited  The CNV of NAGS and CPS1 were similar in stomach adenocarcinoma samples. The majority of the stomach adenocarcinoma samples, 64% and 71%, were diploid for NAGS and CPS1 genes, respectively; 20% and 16% of the stomach adenocarcinoma samples exhibited gains of NAGS and CPS1, respectively; 13% and 10% of stomach adenocarcinoma samples exhibited shallow deletions of NAGS and CPS1, respectively; and the remaining stomach adenocarcinoma samples had either deep deletions or amplifications of the two genes ( Figure 4M). The majority of the stomach adenocarcinoma samples were either diploid (48%) or exhibited a gain (38%) of the citrin gene; the remaining samples exhibited shallow deletions, amplifications, and a loss of the citrin gene ( Figure 4M). There was no association between the CNV and NAGS mRNA expression in the stomach adenocarcinoma samples ( Figure 4N). The expression of CPS1 mRNA was lower in the stomach adenocarcinoma samples with higher copy numbers of the CPS1 gene ( Figure 4O). The expression of citrin mRNA was higher in the stomach adenocarcinoma samples with higher copy number of the citrin gene ( Figure 4P).

Germline and Somatic NAGS, CPS1, and Citrin Sequence Variants
Metabolic dysregulation contributing to increased de novo pyrimidine biosynthesis could result from sequence variants that increase the activity of NAGS, CPS1, and citrin without the need for the overexpression of the corresponding genes. Therefore, we intended to compare functional effects of NAGS, CPS1, and citrin somatic sequence variants found in glioblastoma, glioblastoma multiforme, lung adenocarcinoma, and stomach adenocarcinoma with functional effects of germline variants found in patients with respective urea cycle disorders, and in individuals without known rare diseases. However, the fraction of glioblastoma, glioblastoma multiforme, lung adenocarcinoma, and stomach adenocarcinoma samples with sequence variants in NAGS, CPS1, and citrin was low (Table 1). Therefore, we compared the types of NAGS, CPS1, and citrin sequence variants found in all tumor samples with sequence variants found in individuals without known rare diseases, and patients with NAGS, CPS1, or citrin deficiencies to increase the power of comparison. NAGS, CPS1, and citrin sequence variants found in tumor samples were collected from the TCGA [50], COSMIC [51], and cBioPortal [42,43] databases of tumor genomic information. NAGS, CPS1, and citrin sequence variants present in individuals without rare genetic disorders were collected from the gnomAD database [52]. Pathogenic, likely pathogenic sequence variants, variants of uncertain significance, and variants with conflicting interpretation found in patients with NAGS, CPS1, or citrin deficiency were collected from ClinVar and LOVD [53] databases as well as from published case reports ( Figure 5A). For all three genes, there was little overlap between variants found in tumors, patients, and gnomAD ( Figure 5B). This is not surprising since mechanisms of mutagenesis differ in germline and tumor cells that often exhibit aberrant mismatch repair and proofreading [54].
Although there was little overlap between sequence variants found in patients with NAGS, CPS1, or citrin deficiencies, tumors and gnomAD, the frequencies of different types of sequence variants were similar in the three groups of samples (Table 2 and Figure 5C). Variants were classified based on their location in genes (regulatory, 5 -UTR, intron, and 23, 24, x FOR PEER REVIEW 12 of 26   Loss-of-function (LOF) variants that include nonsense, frameshift, and splicing variants were more frequent in patients with NAGS, CPS1, or citrin deficiency than in tumors and gnomAD ( Figure 5C and Table 2). A higher frequency of LOF variants in patients with NAGS, CPS1, or citrin deficiency than in tumors and gnomAD is expected, since carriers of LOF variants in any of the three genes have normal ureagenesis and phenotype [10]. Very small fractions of sequence variants were found in the 5 -UTRs, splice regions and 3 -UTRs of all three genes ( Figure 5C and Table 2). This is not surprising since these regions are minor portions of NAGS, CPS1, and citrin genes. Sequence variants found in NAGS, CPS1, and citrin introns were common in tumors and gnomAD ( Figure 5C and Table 2). This is likely due to the whole genome sequencing of DNA from tumor samples and gno-mAD, while sequencing of patient DNA focuses on the coding regions and canonical splice sites where most disease-causing sequence variants are found. Both absolute and relative numbers of sequence variants found in CPS1 and citrin introns were higher than in NAGS introns ( Figure 5C and Table 2), likely due to the small size of NAGS introns compared to CPS1 and citrin introns. Synonymous NAGS and CPS1 sequence variants were more frequent in tumor samples and gnomAD than in patients with NAGS or CPS1 deficiency, while citrin synonymous sequence variants were less frequent in gnomAD than in patients and tumor samples ( Figure 5C and Table 2). Surprisingly, whole genome sequencing of DNA from tumor samples and gnomAD did not reveal many variants in the upstream regulatory regions ( Figure 5C and Table 2). This could be due to the poor annotation of gene regulatory regions in the current assembly of the human genome. Missense variants found in all three genes were either the largest or the second largest fraction of all sequence variants in patients, tumors, and gnomAD ( Figure 5C and Table 2). Similar fractions of NAGS and citrin variants found in patients and tumor samples were missense variants, while they represented a slightly lower fraction of NAGS and citrin variants in gnomAD ( Figure 5C and Table 2). This pattern was different for CPS1 missense variants. Similar fractions of CPS1 variants found in tumor samples and gnomAD were missense variants, while they represented the highest fraction of CPS1 variants found in patients with CPS1 deficiency ( Figure 5C and Table 2).
There was more overlap between missense variants found in patients and gnomAD than between somatic missense variants found in tumors and germline variants found in patients and gnomAD ( Figure 6A). The REVEL functional effect predictor was used to evaluate the effects of missense variants on protein function. Missense variants with a REVEL score above 0.5 are considered damaging while missense variants with a REVEL score below 0.5 are considered tolerated [55]. The median REVEL scores of NAGS, CPS1, and citrin missense variants found in patients were all above 0.5 ( Figure 6B-D). This is consistent with the reduced or absent enzymatic activity and/or decreased stability of mutant NAGS, CPS1, and citrin proteins found in patients with respective deficiencies.
The median REVEL scores of NAGS missense variants found in tumors and gnomAD are below 0.5, suggesting that most of these variants do not affect NAGS protein function ( Figure 6B). REVEL scores of CPS1 and citrin missense variants found in tumors and gnomAD are above 0.5 but lower than the median REVEL score in patients ( Figure 6C,D). This likely reflects a higher conservation of CPS1 and citrin proteins across phyla [56] and the weight given to amino acid conservation by REVEL [55].
The strength of evidence for damaging and tolerated/benign effects of a missense variant on protein function can be inferred from its REVEL score [57]. The distribution of NAGS REVEL scores differs in patients vs. gnomAD and tumor samples. REVEL scores for the majority of NAGS missense variants in all three groups of samples indicate an uncertain effect on NAGS function. The second highest fraction (29%) of NAGS missense variants have REVEL scores indicative of moderate support for the damaging effect on NAGS function ( Figure 6E).
The majority of CPS1 missense variants found in patients with CPS1 deficiency (40%) have REVEL scores higher than 0.932, suggesting strong evidence for damaging effects on CPS1 function ( Figure 6F). Between 25 and 30% of sequence variants in all three groups of samples have REVEL scores that indicate moderate evidence for damaging effects on CPS1 function ( Figure 6F). The majority of CPS1 missense variants in gnomAD and tumor samples have REVEL scores that indicate an uncertain effect on CPS1 function ( Figure 6F). Two-thirds of citrin missense variants found in patients with a citrin deficiency have REVEL scores that suggest either a moderate or uncertain effect on citrin function ( Figure 6G). The majority of citrin missense variants in gnomAD and tumor samples have REVEL scores that indicate an uncertain effect on citrin function ( Figure 6F). None of the missense variants found in NAGS, CPS1, and citrin genes had REVEL scores that indicate either strong or very strong evidence for a benign effect on protein function. variants found in NAGS, CPS1, and citrin genes had REVEL scores that indicate either strong or very strong evidence for a benign effect on protein function. Brown-strong evidence for damaging effect on protein function; orange-moderate evidence for damaging effect on protein function; light orange-supporting evidence for damaging effect on protein function; gray-uncertain effect on protein function; light blue-supporting evidence for benign effect on protein function; dark blue-moderate evidence for benign effect on protein function.

NAGS, CPS1, and Citrin Expression in Individual Tumor Samples
NAG, produced by NAGS, is required for CPS1 function in vivo [11][12][13][14][15][16][17][18]58,59]. Increased production of carbamylphosphate (CP) due to the overexpression of CPS1, together with higher cytoplasmic abundance of aspartate due to the overexpression of citrin could contribute to a higher CAD activity and de novo pyrimidine synthesis. Therefore, we examined whether individual glioblastoma, glioblastoma multiforme, lung adenocarcinoma, and stomach adenocarcinoma samples exhibit the overexpression of NAGS, CPS1, and citrin mRNA. The correlation between NAGS and CPS1, citrin and NAGS, and CPS1 and citrin mRNA expression in individual tumor samples was either weak or neg-

NAGS, CPS1, and Citrin Expression in Individual Tumor Samples
NAG, produced by NAGS, is required for CPS1 function in vivo [11][12][13][14][15][16][17][18]58,59]. Increased production of carbamylphosphate (CP) due to the overexpression of CPS1, together with higher cytoplasmic abundance of aspartate due to the overexpression of citrin could contribute to a higher CAD activity and de novo pyrimidine synthesis. Therefore, we examined whether individual glioblastoma, glioblastoma multiforme, lung adenocarcinoma, and stomach adenocarcinoma samples exhibit the overexpression of NAGS, CPS1, and citrin mRNA. The correlation between NAGS and CPS1, citrin and NAGS, and CPS1 and citrin mRNA expression in individual tumor samples was either weak or negligible (Figure 7). CPS1 and citrin protein abundance data were available for individual glioblastoma and lung adenocarcinoma samples [60,61]. The correlation between CPS1 and citrin protein abundance was weak in individual glioblastoma and lung adenocarcinoma samples (Figure 8). These results suggest that an increased cytoplasmic supply of either CP or aspartate may be sufficient for a higher de novo pyrimidine biosynthesis, as well as the need for an alternative explanation of CPS1 activity in the absence of NAGS expression and NAG. adenocarcinoma samples (Figure 8). These results suggest that an increased cytoplasmic supply of either CP or aspartate may be sufficient for a higher de novo pyrimidine biosynthesis, as well as the need for an alternative explanation of CPS1 activity in the absence of NAGS expression and NAG.   adenocarcinoma samples (Figure 8). These results suggest that an increased cytoplasmic supply of either CP or aspartate may be sufficient for a higher de novo pyrimidine biosynthesis, as well as the need for an alternative explanation of CPS1 activity in the absence of NAGS expression and NAG.

Discussion
This study used data mining to assess whether the aberrant expression of NAGS and biosynthesis of NAG can explain the increased activity of CPS1 that contributes to higher de novo pyrimidine production and tumor cell proliferation. We also assessed whether tumors that overexpress NAGS and CPS1 exhibit a higher expression of citrin, which can increase the cytoplasmic supply of aspartate for the de novo biosynthesis of pyrimidine nucleotides. The project was conceived and carried out during the COVID-19 pandemic when access to laboratory research was limited. We asked a series of questions that could be answered using data mining and bioinformatic approaches and used the answers to generate experimentally testable hypotheses.
The urea cycle can catalyze the formation of approximately 20 g of urea nitrogen, which corresponds to 43 g of urea per day in an average man [62]. Urea cycle enzymes and transporters are among the most abundant proteins in periportal hepatocytes [63]. The abundances of NAGS, CPS1, and citrin mRNA and proteins in the lung, brain, esophagus, and stomach tissues are 10-1000 times lower than in the liver, and the expression of the three genes is two-to four-fold higher in glioblastoma, glioblastoma multiforme, lung adenocarcinoma, stomach adenocarcinoma, and stomach and esophagus carcinoma samples. Although markedly lower than in the liver, the expression of CPS1 in lung cancers and p53-depleted tumor cell lines can sufficiently alter cellular homeostasis towards the increased de novo biosynthesis of pyrimidine nucleotides via a higher CAD activity [31,32,34]. The NAGS protein is expressed in two cell lines commonly used to model non-small cell lung carcinoma [36]; this provides a molecular mechanism for CPS1 activity and increased production of CP in the two cell lines. Since CPS1 is a mitochondrial enzyme, CP produced by CPS1 in the mitochondria of tumor cells would have to be translocated into the cytoplasm to enter de novo pyrimidine biosynthesis. Excess CP, produced in hepatocytes of patients with OTC deficiency, can exit mitochondria and enter pyrimidine biosynthesis leading to the accumulation of orotic acid, which is an intermediate of pyrimidine biosynthesis and a biomarker of OTC deficiency [10,11].
ARALAR1 is a citrin paralog with identical biochemical function [22]. ARALAR1 is highly expressed in the brain, skeletal muscle and heart, present in other tissues and absent from the liver, the site of high citrin expression [22]. An increased supply of cytoplasmic aspartate due to the ectopic overexpression of citrin could disrupt the balance of cytoplasmic metabolites, resulting in a higher CAD activity, higher de novo pyrimidine biosynthesis, and metabolic reprogramming that leads to increased cell proliferation. In addition to supplying cytoplasmic aspartate for increased CAD activity, the overexpression of citrin in tumor cells could lead to metabolic reprogramming in tumor cells through the disruption of the malate-aspartate shuttle and energy production [29,30]. Therefore, we hypothesize that the overexpression of NAGS, CPS1, and citrin promotes the proliferation of glioblastoma, glioblastoma multiforme, stomach adenocarcinoma, and stomach and esophagus carcinoma cells through the increased biosynthesis of pyrimidine nucleotides and/or dysregulated malate-aspartate shuttle. This hypothesis could be tested in glioblastoma, stomach, and esophageal cancer cell lines by determining whether they express NAGS, CPS1, and/or citrin genes and proteins, followed by measuring the rate of cell proliferation and metabolite concentrations after knocking down the expression of the three genes.
Copy number variation, aberrant epigenetic regulation, and gain-of-function sequence variants are molecular mechanisms that could contribute to the high activity of NAGS, CPS1, and citrin in tumor cells that originate from tissues that normally do not express the three genes. An increased gene copy number appears to be responsible for citrin overexpression in glioblastoma, glioblastoma multiforme, lung adenocarcinoma, and stomach adenocarcinoma samples. Unlike the four tumor types examined here, amplification of the chromosomal region that harbors the citrin locus has been observed in hepatocellular carcinoma samples, but the abundance of citrin mRNA was similar in the tumors and neighboring normal tissue [64].
An altered chromatin structure of NAGS, CPS1, and citrin loci in tumors originating from the brain, lung, stomach, or esophagus tissues would allow the binding of transcription factors and RNA polymerase II to promoters and enhancers of the three genes. Histone modifications, CTCF, and RNA polymerase II binding to NAGS, CPS1, and citrin regulatory regions in the liver, lung, and A549 lung adenocarcinoma cells suggest that all three genes appear to be poised for expression in the lung tissue. Reporter gene assays have shown that the NAGS promoter functions in the A549 cell line, but the −3 kb NAGS enhancer does not [45]. This could explain the low level of NAGS expression in the lung. Therefore, we hypothesize that CPS1 and citrin promoters could function in the glioblastoma, glioblastoma multiforme, stomach adenocarcinoma, and stomach and esophagus carcinoma cells. This hypothesis could be tested with expression constructs in which NAGS, CPS1, or citrin promoters and enhancers control reporter gene expression in cell lines that model glioblastoma, glioblastoma multiforme, stomach adenocarcinoma, and stomach and esophagus carcinoma.
There was little overlap between somatic sequence variants found in tumor samples and germline sequence variants found in patients with NAGS, CPS1, and citrin deficiencies, and in individuals without rare genetic diseases. This is not surprising, because defective DNA proofreading and mismatch repair as well as the imbalance of purine and pyrimidine pools in tumor cells results in different types of nucleotide replacements [54]. Gain-offunction sequence variants in gene regulatory elements can cause the increased expression and/or activity of NAGS, CPS1, and citrin in tumors. The data in the databases of somatic sequence variants was insufficient to evaluate whether sequence variants in NAGS, CPS1, and citrin promoters and enhancers contribute to the overexpression of the three genes in tumors. NAGS, CPS1, and citrin missense variants found in tumors are predicted to be less likely to damage protein function than missense variants found in patients with respective gene deficiencies. This could reflect the limitation of computational methods to predict gain-of-function effects of amino acid replacements. Gain-of-function missense variants that result in a higher enzymatic activity of mutant proteins have been observed in urea cycle enzymes. A high-throughput functional assay of all single nucleotide variant (SNV)-accessible amino acid replacements in human OTC revealed that around 5% of all OTC variants have up to 30% higher activity than wild-type human OTC [65]. Furthermore, human OTC with two common p.R46K and p.Q270R sequence variants had 40% higher enzymatic activity than the wild-type or either of the single mutant proteins [66]. Therefore, we hypothesize that some of the NAGS, CPS1, and citrin missense variants found in tumors can increase the activity of mutant proteins. This hypothesis could be tested in a highthroughput functional screen in S. cerevisiae, since this organism has homologs of NAGS, CPS1, and citrin [67,68].
Our analysis did not show an association between aberrant citrin expression and unfavorable outcomes in patients with glioblastoma, glioblastoma multiforme, and lung adenocarcinoma. There was no association between the expression of NAGS, CPS1, or citrin and stomach adenocarcinoma patient outcomes. The number of tumor samples with aberrant expression of two out of the three genes analyzed in this study was insufficient for evaluation of patient outcomes. Association between NAGS overexpression and unfavorable outcomes in patients with glioblastoma and glioblastoma multiforme suggests that the aberrant overexpression of NAGS and production of NAG could promote tumor development by activating CPS1 without the requirement for a correlated expression of NAGS and CPS1 in tumor cells. Aberrant overexpression of NAGS could contribute to poor outcomes for patients with glioblastoma and glioblastoma multiforme independently of CPS1. NAG has been detected in the cytoplasm of mammalian and avian brain cells [69,70] but the function of NAG in the brain remains poorly understood. The abundance of NAG in the mammalian brain and liver are similar [69]. NAG that is produced in the liver mitochondria is transported to the cytoplasm for degradation [71]. A similar transport mechanism may operate in the brain cells allowing NAG, produced in the mitochondria of brain cells by the aberrantly overexpressed NAGS, to be transported in the cytoplasm, increase the pool of cytoplasmic NAG, and contribute to metabolic reprogramming and tumorigenesis.
Our analysis replicated the association between CPS1 overexpression and unfavorable outcomes in patients with lung adenocarcinoma [31]. However, the association between lower NAGS expression and unfavorable outcomes in patients with lung adenocarcinoma is consistent with the absence of correlated NAGS and CPS1 expression in this tumor type. These observations were supported by the very weak correlation between NAGS, CPS1, and citrin expression in the individual glioblastoma, glioblastoma multiforme, lung adenocarcinoma, and stomach adenocarcinoma samples. This raises the question regarding the mechanism behind aberrant CPS1 activity that contributes to metabolic reprogramming towards increased de novo pyrimidine biosynthesis in tumor cells. One possibility is that CPS1 missense variants found in tumors enable CPS1 enzymatic activity without NAG. This hypothesis could be tested in a high-throughput functional activity screen in yeast cells that do not produce NAG. Another possibility is the activation of CPS1 in tumor cells by N-carbamylglutamate (NCG) produced by the human microbiota. NCG is a structural analog of NAG that can cross plasma and mitochondrial membranes to bind and activate CPS1 in patients with a NAGS deficiency [59]. NCG is an intermediate of histidine catabolism in bacteria and has been detected in metabolomes from gut and oral bacteria [72][73][74]. Therefore, it is possible that NCG of microbial origin can activate CPS1 in tumor cells.
The connection between increased de novo biosynthesis of pyrimidine nucleotides, tumor cell proliferation, and aberrant CPS1 expression and activity made CPS1 an attractive target for antitumor drugs. AT067-H09 and H3B-120 are two inhibitors of CPS1 in vitro, in cultured hepatocytes and in cultured non-small cell lung carcinoma cell lines that appear to be promising antitumor drug candidates [36,75]. In addition to inhibiting aberrantly expressed CPS1 in tumor cells, AT067-H09 and H3B-120 could inhibit hepatic CPS1 and block the urea cycle leading to hyperammonemia, which can cause irreversible brain injury [10]. Therefore, preclinical and clinical testing of AT067-H09 and H3B-120 will require close collaboration between oncologists and physicians who treat patients with urea cycle disorders, as well as patients with acute and chronic liver failure to avoid hyperammonemia as a serious side effect of tumor treatment with drugs such as AT067-H09 and H3B-120.

Expression Patterns of Urea Cycle Genes
The Firehose Data portal, developed by the Genome Data Analysis Center (GDAC) at the Broad Institute, was used to query gene expression profiles of tumors and matched normal tissues for the expression levels of all six urea cycle enzymes and two transporters. Median Log 2 RSEM values [37,38] for tumor samples and the matched normal tissues were extracted using the Firebrowse API (Supplementary Table S1). The median Log 2 RSEM values for urea cycle genes in tumor and matched normal tissue samples were determined by GDAC. The median Log 2 RSEM values from 28 tumor samples and matched normal tissues were used to calculate the fold-change values. For each urea cycle gene in each tumor type, the median Log 2 RSEM value for the matched normal tissue samples was subtracted from the median Log 2 RSEM value in tumor samples. The fold-change values were then calculated as two raised to the power of the difference between the median Log2RSEM values in tumor samples and matched normal tissues.
The abundance of urea cycle gene transcripts in the liver, small intestine, stomach, esophageal mucosa and smooth muscle, cerebellum, cerebral cortex, and lung tissues were obtained from the GTEx Project database (Supplementary Table S2). The GTEx database (https://gtexportal.org/home/datasets) was accessed on 26 June 2022. The abundance of urea cycle enzymes and transporters in the liver, small intestine, stomach, esophagus, brain, and lung tissues were obtained from the GTEx Project [41] and Proteomics DB, which were accessed on 23 June 2022 [39,40] (Supplementary Tables S3 and S4, respectively).

Patient Outcome Data
The correlation between patient outcomes and NAGS, CPS1, or citrin mRNA expression levels was available for select studies of glioblastoma [81], glioblastoma multiforme [79], lung adenocarcinoma [79,80], and stomach adenocarcinoma [79] at the cBioPortal. NAGS, CPS1, and citrin gene-specific charts were selected for each study. Gene-specific expression data for NAGS, CPS1, or citrin were used to define patient groups based on the quartiles of gene expression levels. Kaplan-Meier survival curves were compared for patients in the highest and the lowest expression quartile for each gene (Supplementary Table S6).

Sequence Variant Data
CNV data for NAGS, CPS1, and citrin were available at the cBioPortal for select studies of glioblastoma [61,81], glioblastoma multiforme [50], lung adenocarcinoma [60,78,80], and stomach adenocarcinoma [50]. The number of samples harboring deep deletions, shallow deletions, two copies, gain and amplification of NAGS, CPS1, or citrin genes (Supplementary Table S7), as well as the correlation between the type of CNV and NAGS, CPS1, or citrin expression levels (Supplementary Table S8) were obtained for the four tumor types.
The following databases and websites were queried in September 2021 to collect NAGS, CPS1, and citrin single nucleotide sequence variants and small indels found in (1) patients with NAGS, CPS1, or citrin deficiency; (2) tumor samples; and (3) individuals without rare genetic diseases: ClinVar (https://www.ncbi.nlm.nih.gov/clinvar/ (accessed on 1 May 2021)), Leiden Open Variant Database (LOVD) [53], COSMIC [51], cBioPortal [42,43], TCGA Data Portal [50], and gnomAD [52]. Published reports of sequence variants found in patients with NAGS, CPS1, and citrin deficiencies were also included in the analysis. NAGS, CPS1, and citrin sequence variants annotated as pathogenic, likely pathogenic, variants of unknown significance, or variants with conflicting interpretation in ClinVar and/or LOVD were included in the analysis. Variant validator [90] was used for the formatting of all sequences found in patients, tumors, and healthy individuals (Supplementary Table S9). Sequence variants were classified by type and location in the NAGS, CPS1, and citrin genes: regulatory (located upstream of transcription start site), 5 -UTR, missense, nonsense, frameshift, in-frame indel, synonymous, splicing (affecting canonical GT and AG splice signals at the 5 and 3 -ends of introns), splice region (located up to 10 bp downstream and upstream of the GT and AG splice signals, respectively), intronic (located more than 10 bp away from the canonical splice sites), 3 -UTR, and unknown. Nonsense, frameshift, and splicing sequence variants were considered loss-of-function variants. The REVEL functional effect predictor was used to evaluate the effects of missense variants on protein function. REVEL [55] scores of NAGS, CPS1, and citrin missense variants obtained through the Ensembl Variant Effect Prediction tool [91] were used to evaluate the effects of missense variants on protein function (Supplementary Table S10). Missense variants with a REVEL score above 0.5 are considered damaging, while missense variants with a REVEL score below 0.5 are considered tolerated [55]. The ClinGen recommendations for relating the REVEL score of a missense variant to the strength of evidence that the missense variant has either a damaging or benign effect on protein function [57] were used to refine the functional effect predictions for NAGS, CPS1, and citrin variants found in patients, tumors, and gnomAD. A REVEL score greater than or equal to 0.932 is considered strong evidence for the damaging effect of a sequence variant on protein function [57]. A REVEL score between 0.932 and 0.773 is considered moderate evidence for the damaging effect of a sequence variant on protein function [57]. A REVEL score between 0.773 and 0.664 is considered supporting evidence for the damaging effect of a sequence variant on protein function [57]. The effect of a sequence variant with REVEL score between 0.644 and 0.290 on protein function is uncertain [57]. REVEL scores between 0.290 and 0.183, 0.193 and 0.016, and 0.016 and 0.003 are considered respectively as supporting, moderate, and strong evidence for benign effect on protein function, while a REVEL score lower than 0.003 is considered very strong evidence for benign effect on protein function [57].

Epigenetic Regulation Data
The ENCODE Project Functional Genomics Portal [92,93] was queried for the availability of data for DNase sensitivity and hypersensitivity sites, histone modifications, CTCF binding sites, and RNA polymerase II binding sites in the liver, lung, adult brain, and stomach tissues, as well as cancer cell lines that model glioblastoma, glioblastoma multiforme, lung adenocarcinoma, and stomach adenocarcinoma. The following filters were applied to the ENCODE Experimental Matrix: DNA binding and DNA accessibility for assay type; TF-ChIP-seq, Histone ChIP-seq, DNase-seq and ATAC-seq for assay title; Homo sapiens for organism; cell line and tissue for biosample classification; brain, liver, lung, stomach, A549, H54, A172 and M059J for biosample; liver, lung, brain, and stomach for organ. Histone ChIP-seq, TF-ChIP-seq, DNase-seq, and ATAC-seq data were available for liver, stomach, and lung tissues, and for the A549 lung adenocarcinoma cell line. Histone ChIP-seq, TF-ChIP-seq, and DNase-seq data were available for embryonic brain samples and brain samples from patients with Alzheimer's disease. BigWig fold-change files for biological and technical replicates for the following experiments were downloaded: CTCF ChIp-seq, RNAP2A ChIP-seq, DNase-seq, ATAC-seq, H3K27Ac ChIP-seq, and H3K4me3 ChIP-seq. Genomic coordinates corresponding to NAGS, CPS1, and citrin loci were determined based either on the location of CTCF binding sites or the adjacent upstream and downstream gene.
A custom Python script (https://github.com/MIMOR02/bigwig-file-vizualizations (accessed on 10 June 2022)) was used to extract and graph the fold-change signal from BigWig files for the NAGS, CPS1, and citrin loci. There are three notebooks in the GitHub repository which can be run in JupyterLab or Google Colab: Data_Visualization-GITHUB_PUSH.ipynb, ENCODE_JSON_parser-GITHUB_PUSH.ipynb, and Format_Directory-GITHUB_PUSH.ipynb. ENCODE_JSON_parser-GITHUB_PUSH.ipynb scrapes an experiment matrix from the EN-CODE Project website for the desired bigwig file download links. Bigwig files were downloaded manually. Format_Directory-GITHUB_PUSH.ipynb creates a nested file system that all downloaded bigwig files will be sorted into. Data_Visualization-GITHUB_PUSH.ipynb reads in bigwig files, generates appropriate graphs to the user's specifications, and moves the visualizations into the file system created by Format_Directory-GITHUB_PUSH.ipynb.  Institutional Review Board Statement: Not applicable. Patient consent was waived due to the data collection from public databases. Informed Consent Statement: Not applicable. Patient consent was waived due to the data collection from public databases.

Data Availability Statement:
All data used in this study have been included in Supplementary Files. Custom Python scripts used for acquiring and analysis of the data from the ENCODE Project are available at https://github.com/MIMOR02/bigwig-file-vizualizations (accessed on 10 June 2022).